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Abstract. We call a CNF formula linear if any two clauses have at most one 
variable in common. Let m(k) be the largest integer m such that any linear fc-CNF 

formula with < m clauses is satisfiable. We show that .\ , a < m(k) < ln(2)fc 4 4 fc . 
More generally, a (fc, d)-CSP is a constraint satisfaction problem in conjunctive 
normal form where each variable can take on one of d values, and each constraint 
contains k variables and forbids exacty one of the d k possible assignments to these 
variables. Call a (k, d)-CSP ^-disjoint if no two distinct constraints have £ or more 
variables in common. Let mt(k,d) denote the largest integer m such that any 
C3 . ^-disjoint (fc, d)-CSP with at most m constraints is satisfiable. We show that 

<N ■ ~k ( SFis) 1+? " * "*(*■ d ) < c i k2rl W) 1+liT • 

-s_i ■ for some constant c. This means for constant £, upper and lower bound differ only 

ryr\ ' in a polynomial factor in d and k. 
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1 Introduction 

o. 

How difficult is it to come up with an unsatisfiable CNF formula? Stupid question, you 
might think: {{a;},{S}}, here is one. Two clauses, each containing one literal, and un- 
satisfiable. Well, yes, but what if we want a fc-CNF formula, i.e., we require that every 
clause contains exactly k literals? Now it's a little bit less trivial, but still easy: Take a 
r> \ clause {xi, X%, ■ ■ ■ ,%k}t then {xx,xi, . . . ,#&}, {xi,2J2> • • • >%k}t until you have exhausted 

all 2 fc combinations of negative and positive literals. Each assignment to the k variables is 
ruled out by exactly one clause: Your formula has 2 k clauses, and it is unsatisfiable. This 
formula is the "simplest" unsatisfiable fc-CNF formula, in a sense as K^+i is the simplest 
non-A;-colorable graph. What if we impose further restrictions? For example, what if no 
variable can occur in more than one clause? This restriction is surely too strong: One 
can satisfy each clause individually, hence such a formula is always satisfiable, unless it 
contains the empty clause. 

Let us consider two weaker restrictions. First, what if each variable may occur in 
several clauses of our fc-CNF formula, but in at most dl Let us call such a formula a 
d-bounded k-CNF formula. Second, what if we allow every pair of variables to occur in 
at most one clause, or, equivalently, allow any two clauses to have at most one variable 
in common? Such a formula is called, in analogy to hypergraph terminology, a linear 
fc-CNF formula. 

The first problem has been introduced by Tovey pQ, who showed, using Hall's Mar- 
riage Theorem, that every fc-bounded fc-CNF formula is satisfiable. This has been im- 
proved by Kratochvil, Savicky and Tuza [2], who proved that there is some threshold 
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function /(fc) such that any /(fc)-bounded fc-CNF formula is satisfiable, but deciding 
satisfiability of /(fc) + 1-bounded fc-CNF formulas is already NP-complete, and further, 
that /(fc) > -jr. For an upper bound on how often we can allow a variable to occur while 
still guaranteeing satisfiability, Hoory and Szeider [3] show how to construct unsatisfi- 

able d-bounded /c-CNF formula for d £ mathcalO (^^VcNF formulas. Thus, f(k) 
is known up to a logarithmic factor. 

For the second question, let us give an unsatisfiable linear 2-CNF formula: 

{{u, v}, {v, w}, {w, x}, {x, u}, {u, w}, {v, x}} . 

This formula has 6 clauses, which is as few as possible for unsatisfiable linear 2-CNF for- 
mulas. Finding an unsatisfiable 3-CNF formula is already much harder. Hence we may ask 
the following question: For which k do unsatisfiable linear fc-CNF formulas exist, and if 
they exist, how many clauses do they have? The existence question has been answered by 
Porschen, Speckenmeyer and Zhao [1], who give an explicit construction of unsatisfiable 
linear fc-CNF formulas, for any k. However, the size of their formulas (i.e., the number 
of clauses), is gigantic: Let m(k) be the size of the unsatisfiable linear /c-CNF formula 
obtained by the construction in [4]. Then m(0) = 1 and m(k + 1) = m(k)2 m ^ k \ In this 
paper, we prove that much smaller unsatisfiable linear fc-CNF formulas exist, namely 
of size poly(fc)4' c , and complement this by proving a lower bound of — ttpj" Since the 

smallest non-linear unsatisfiable fc-CNF formula has exactly 2 k clauses, this shows that 
unsatisfiable linearity formuals require significantly more clauses than non-linear ones. 

A similar problem has been investigated, and to large extent solved, for hypergraphs: 
An r-hypergraph TL is a hypergraph where every edge has r vertices, and a proper 
fc-coloring of TL is a coloring of the vertices such that no edge is monochromatic. A 
hypergraph is called linear if \e\ (~l e 2 | < 1 for any two distinct edges ei,e2 of TL. It is 
easy to construct a non-fc-colorable r-hypergraph, for any k and r. However, it is not 
obvious whether non-/c-colorable linear r- hypergraphs exist. For k — 2, this has been 
positively answered by Abbott [5]. For general fc, existence follows from the Hales- Jewett 
theorem [6]. Using Ramsey-like theorems, the obtained bounds on the size of TL have 
been quite poor. Tight bounds — up to a constant factor — have later been given by 
Kostochka, Mubayi, Rodl and Tetali [7], usign probabilistic techniques. 



1.1 Notation and Terminology 

Though we are primarily interested in linear fc-CNF formulas, our methods apply to a 
much more general class, namely (fc, d)-constraint satisfaction problems, or short (fc, d)- 
CSPs. This is basically the same as a fc-CNF formula, only that each variable can take 
on one of d different values, not just 2 as in the binary case. In this context, a literal is an 
inequality x ^ b, where x is a variable and b 6 {0, 1, . . . , d — 1}. A fc-constraint is a set of 
set of fc literals, and a (fc, rf)-CSP is a set of fc-constraints. An assignment is a mapping 
from variables to {0, 1, . . . , d — 1}. An assignments a satisfies a literal x =/= b if, well, 
a(x) ^ b. It satisfies a constraint if it satisfies at least one literal in it, and it satisfies 
a CSP if it satisfies every constraint of it. An issue that sometimes causes confusion is 
whether one allows a constraint to contain several literals involving the same variable. 
We do not. However, this is not important, since such a constraint, e.g., {x ^ 0, x ^ 1} 
would be satisfied by every assignment anyway. 
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We say variable x occurs in constraint C if C contains the literal x ^ b for some 
b £ {0, 1, . . . , d — 1}. For a CSP F, we denote by deg(x, F) the number of constraints 
C G F in which x occurs, and by vbl(C) the set of all variables occurring in constraint 
C. For example vbl({x ^ 0,i/ ^ l,z j^ 1}) = {x,y,z}. A CSP F is called (.-disjoint if 
there are no two distinct constraints C,D e F with |vbl(C) fl vbl(D)| > £. Thus, a linear 
fc-CNF formula is a 2-disjoint (k, 2)-CSP. 

1.2 Results 

Let m(k) be the largest integer m such that any linear fc-CNF formula with < m clauses 
is satisfiable. For CSPs, let mg(k, d) denote the largest integer m such that any ^-disjoint 
(fc, d)-CSP with at most m constraints is satisfiable. Clearly rrizik, d) = m(k). Our main 
result is 

Theorem 1.1. There is some constant c > such that 

]i i 

'' » ' <m l (k,d)<c(k 2 e- 1 ln{d)d k ) 1+T ^ . (1) 
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To understand these bounds, suppose £ is constant. Then the dominating term is 
d ^ + 7=i> in both the upper and lower bound, and the two bounds differ only by a 
polynomial factor in k and d. For linear fc-CNF formulas, we obtain 



< m(k) < fc 4 4 fe . (2) 
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Compare this with the bound for general (k, d)-CSPs: The smallest unsatisfiable 
(k,d)-CSP has exactly d k constraints. 

2 A Lower Bound 

Our main tool to prove a lower bound is the symmetric version of the Lovasz Local 
Lemma (see e.g. [8]): 

Lemma 2.1 (Lovasz Local Lemma). Let £\ 1 . . . ,£ n be events in a probability space 
with Pr[£j] < p for every i. If each event £i is independent of all other events except at 
most d many, and ep(d + 1) < 1, then Pr[(J £i\ < 1. 

The following corollary states that any CSP is satisfiable unless some variable occurs 
"too often". This has been shown by :2] for d = 2, and their proof directly generalizes to 
general d. 

Corollary 2.2. If F is a (k,d)-CSP and deg(x,F) < =r for every variable x, then F is 
satisfiable. 

Proof. Assign each variable uniformly at random a value from {0, 1, . . . , d — 1}. Write 
F = {Ci, . . . , C m } and let £i be the event that constraint C, is not satisfied. Clearly 
p := Pr[£] = d~ k . Event £i is independent of all other events except those events £j 
where vbl(C;) fl vbl(Cj) ^ 0, i.e. those constraints sharing a variable with d. Since 
vbl(Ci) contains k variables, and each occurs in at most =_ — \ other clauses, Ci shares a 
variable with at most k I ^ — 1 ) < e~ 1 d k — 1 other clauses. By Lemma [2TTT with positive 
probability none of the events £i occurs, i.e., F is satisfiable. □ 
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Let F be a (fc,d)-CSP. We call x frequent in F if deg(x,F) > J-± k - Our idea 
is that an ^-disjoint (k, d)-CSP with few frequent variables can be transformed into a 
(k — £+1, d)-CSP F' having no frequent variable. By Corollary 12. 21 F' is satisfiable, and 
the transformation is such that F is satisfiable, too. 

i 
Theorem 2.3. Any (-disjoint (k,d)-CSP with < I Jt-i k ) frequent variables is sat- 
isfiable. 

Proof. We obtain a new formula F' by removing certain literals from certain clauses: For 
each constraint C G F, we distinguish two cases: If C contains less than £ variables that 
are frequent in F, let C by C minus all literals involving one of these frequent variables. 
Otherwise, let C" just be C. We define F' := {C'\C G F}. Observe that F' contains 
constraints of different sizes, ranging from k — £ + 1 to k. Further, for each constraint in 
C G F', the number of variables in vbl(C') that are frequent in F is either or > i. 

We claim that deg(x, F') < ed f_ t k for any variable x. If x is not frequent in F, this is 
obvious, since deg(£, F') < deg(x, F). If x is frequent in F, let C\, . . . , Q, t :— deg(x, F') 
be the clauses of F' containing x. Clearly, each Ci contains x, which is frequent in F. 
For each d G F 1 containing x, Ci contains at least £ — 1 variables besides x which are 
frequent in F. We pick t — 1 of them arbitrarily and call this set D{. Clearly Di ^ Dj for 
i =/= j, otherwise the ^-set Di U {x} would occur in d and Cj, contradicting i'-disjointness 
of F'. Let n be the number of frequent variables in F. There are at most (»",) choices 
for an (£ — l)-set of frequent variables, thus 

deg(x,F')=t< ( " ) <^- J < "" 



I) ~ ~ ed l - l k ' 

We would now like to apply Corollary 12.21 for (k — £ + 1, d)-CSPs. However, F' is not a 
(k — £ + l,d)-CSP, because it may still contain larger constraints. This is no problem, 
as we can further delete literals until every constraint has size exactly (fc — £ + 1). This 
process clearly does not increase any deg(:r, F'). Hence, by Corollary |2.2i F 1 is satisfiable, 
and so is F. □ 

Proof of the lower bound in Theorem 11.11 Assume F is an unsatisfiable £- 
disjoint (k, d)-CSP. Then by Theorem l2.3[ we have ( ed f-i k ) frequent variables. Since 

ceF x 

-^ i i 
and \C\ = k for all C G F, it follows that F has more than ^ ( e J_ t k j constraints. 

3 The Upper Bound 

In this section we complement our lower bound by an upper bound. The ratio of upper 
and lower bound will be polynomial in k and d, but the degree of the polynomial will 
depend on £. 

The proof of the upper bound uses the first moment method and proceeds in two 
steps. First, we show that for given n, k, d and £, we can find an ^-disjoint (k, d)-CSP F 
over n variables with "many" clauses. In a second step, we replace each literal x ^ b in 
each constraint of F by x ^ &', where b' is each time chosen independently uniformly at 
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random from {0, 1, . . . , d — 1}, resulting in a random ^-disjoint (fc, <i)-CSP F'. We will 
show that for the right values of n, F' is unsatisfiable with positive probability. 

As long as we do not care about the values b in the literals, a CSP is basically nothing 
more than a hypergraph. 

Lemma 3.1. Let £ < k < n. There exists an (.-disjoint k-uniform hypergraph with 



m 



ar 



edges. 



Proof. We will actually prove something stronger. Let S be the set of all fc-sets of 
{1, . . . , n\. We claim that any maximal ^-disjoint subfamily Ti C S has at least m sets. 
Suppose Ti C S is maximal. For A, B G S, we say A is incompatible with B \ A D B\ > I. 
Note that by this definition, A is incompatible with itself. By maximality of Ti, each 
A E S is incompatible with some B € Ti.. For each B € Ti, there are at most 

k\ (n - t 
l)\k-t 



\k-t) 



sets B £ S incompatible with A: Each fixed k — ^-subset of A is contained in 

subsets of {1, . . . , n}, and A contains (^) such ^-subsets. Hence \S\ < («) (tZ/) |W|, and 
the claim follows after a short calculation. D 

We bound m, the size of the ^-disjoint (k, (i)-hypergraph on n vertices, from below 
by a formula that will be easier to work with: 



fk\ 2 ~ \kJ (ek) 1 



-'-->- (W i^ -' (i ■ < 3 ' 



We can obtain a (k, d)-CSP over variable set V — {xi, . . . , x n } from a fc-uniform hy- 
pergraph over vertex set {vi, . . . , v n } by simply replacing each edge {v±,V2, ■ ■ ■ , Ufe} by a 
constraint {x\ ^ b\, . . . , Xk ^ bk}, where we sample each bi independently and uniformly 
at random from {0, . . . , d— 1}. We obtain a random CSP F. Any fixed assignment a has 
a chance of d~ k to satisfy a random constraint, and each random constraints is chosen 
independently. Hence a satisfies F with probability (l — <i~ fc ) , where m = \F\ is the 
number of constraints. The expected number of satisfying assignments of F is 

Y Pr[a satisfies F] = d n (l - <r fe )" 1 < e Md)n-d- h m _ ^ 

a:V^{0,...,d-l} 

If we can choose n and m such that the latter term is < 1, then with positive 
probability, F is not satisfiable. We re-write this condition: 

ln(d)n - d~ k m < <-» 
m > ln(d)nd k 

Combining this with ©, we see that it suffices to choose n such that 

n'n" 1 > ln(d) (^f) d fc , 



VI 

and we choose 



x ) M*)*) 



k\ T=T 



Hence there is some constant c such that 

JL_n 

' ph 2 \ v- 1 ii! 



ek 2 



\ /pk 2 \~~ I 1 

J < c ( ^ J (\n(d)d k ) ~ = c (e/t 2 ^- 1 ln(d)d fc ) 



With these values of n and to, the rightmost term in (j4|) is < 1, and thus with posi- 
tive probability, the random (fc, ci)-CSP F has satisfying assignments. This finishes the 
proof of Theorem ll.il □ 



4 Conclusions and Open Problems 

We determined the value of mg(k, d) up to a factor that is, for constant £, polynomial in 
k and d. Can one eliminate the exponential factor d~ e+1 in the lower bound? 

Further, we do not have any good explicit construction of unsatisfiable linear fc-CNF 

formulas. Can one derandomize our randomized construction? Our lower bound suffers 

i 

from a similar problem: Given an ^-disjoint (k, d) -CSP formula F with < I ed f_ t fe J 

frequent variables, we know that F is satisfiablc, but we do not know how to find a 
satisfying assignment in polynomial time. 

Last, can one obtain any good lower bound on me(k, d) that does not use the Lovasz 
Local Lemma? 
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